Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clean up async calls with expired leases #2435

Merged
merged 11 commits into from
Aug 27, 2024
Merged

Conversation

matt2e
Copy link
Collaborator

@matt2e matt2e commented Aug 20, 2024

fixes #2214

Changes:

  • The controller now calls the verb within the lease context, so a failure to heartbeat the lease cancels the grpc call
  • A new periodic task that cleans up async calls with a state of executing but without a lease
    • This happens after the lease has been reaped in a separate job
    • Cleaning up involves:
      • Scheduling retries/catches according to the policy
      • Triggering the origin-specific code (eg: fsm async calls need the next event table to be cleared)

Known issues:

  • If there is a repeating error cleaning up an async call, then it will just repeatedly fail
  • Java does not currently have a way to stop execution of the call that gets canceled, so it will always continue even when the async call's lease expires

@matt2e matt2e added the run-all A PR with this label will run the full set of CI jobs in the PR rather than in the merge queue label Aug 20, 2024
@ftl-robot ftl-robot mentioned this pull request Aug 20, 2024
@matt2e matt2e force-pushed the matt2e/reap-async-calls branch from 2652ced to bfb5243 Compare August 21, 2024 02:42
@matt2e matt2e marked this pull request as ready for review August 21, 2024 02:42
@matt2e matt2e requested a review from alecthomas as a code owner August 21, 2024 02:42
@matt2e matt2e requested review from a team and worstell and removed request for a team August 21, 2024 02:42
@alecthomas
Copy link
Collaborator

  • Java does not currently have a way to stop execution of the call that gets canceled, so it will always continue even when the async call's lease expires

Can you file a ticket for this?

}
for _, call := range calls {
callResult := either.RightOf[[]byte]("async call lease expired")
_, err := s.dal.CompleteAsyncCall(ctx, call, callResult, func(tx *dal.Tx, isFinalResult bool) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using the tx, not the dal

if err != nil {
return 0, fmt.Errorf("failed to complete zombie async call: %w", err)
}
// TODO: telemetery
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the telemetry in this PR, or file a ticket.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 👍

backend/controller/sql/queries.sql Outdated Show resolved Hide resolved
@matt2e matt2e enabled auto-merge August 23, 2024 00:17
matt2e added 7 commits August 27, 2024 14:01
# Conflicts:
#	backend/controller/sql/querier.go
#	backend/controller/sql/queries.sql.go
# Conflicts:
#	backend/controller/sql/querier.go
#	backend/controller/sql/queries.sql.go
@matt2e matt2e force-pushed the matt2e/reap-async-calls branch from 6ecf04c to 7acb969 Compare August 27, 2024 04:10
@matt2e matt2e force-pushed the matt2e/reap-async-calls branch from c1b588f to b809e59 Compare August 27, 2024 04:23
@matt2e matt2e added this pull request to the merge queue Aug 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 27, 2024
@matt2e matt2e added this pull request to the merge queue Aug 27, 2024
Merged via the queue into main with commit 743e146 Aug 27, 2024
76 checks passed
@matt2e matt2e deleted the matt2e/reap-async-calls branch August 27, 2024 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-all A PR with this label will run the full set of CI jobs in the PR rather than in the merge queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async calls get stuck if controller dies
2 participants